In this report a steering algorithm (A-CMA) used in phased array antennas is implemented on both a CPU and a GPU using Matlab and CUDA. The bottlenecks of both implementations are investigated and profiled. It is shown that for a small number of receiving antennas the CPU implementation is faster than the GPU implementation. However, with a large number of antennas, the GPU implementation is far more efficient.